City College of San Francisco
MATH 108 - Foundations of Data Science
Lecture 39: Updating Predictions¶
Associated Textbook Sections: 18.0 - 18.2
Outline¶
Set Up the Notebook¶
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
Decisions¶
Decisions Under Uncertainty¶
Interpretation by Physicians of Clinical Laboratory Results (1978)
We asked 20 house officers, 20 fourth-year medical students and 20 attending physicians, selected in 67 consecutive hallway encounters at four Harvard Medical School teaching hospitals, the following question:
If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming that you know nothing about the person's symptoms or signs?
Eleven of 60 participants, or 18%, gave the correct answer. These participants included four of 20 fourth-year students, three of 20 residents in internal medicine and four of 20 attending physicians. The most common answer, given by 27, was that [the chance that a person found to have a positive result actually has the disease] was 95%.
Medical Testing Scenario¶
- Rare disease with prevalence of 1/1000 in population
- There is a test (e.g., antigen test) with the following properties
- False Positive Rate of 5%: If you do NOT have the disease then 5% of the time, the test says you do.
- False Negative Rate of 1%: If you DO have the disease then 1% of the time, the test says you do not have the disease.
- If you sample a person at random and they test positive, what is the chance they have the rare disease?
Truth and Test Results¶
All patients fall into one of 4 categories:
False Positive Rate¶
False Negative Rate¶
Another Scenario¶
- Class consists of Freshmen (60%) and Sophomores (40%)
- Some of the students have declared their major
- 50% of the Freshmen years have declared their major
- 80% of the Sophomores years have declared their major
- I pick one student at random ...
- That student has declared a major!
- Which is more likely: Freshman or Sophomore?
What do these scenarios have in common?¶
- There is some chance event that I am interested in
- person has a disease
- the student's year
- I start with some prior (before observing anything) information about that quantity P(Disease) or P(Year)
- I then observe something whose value depends probabilistically on the original chance event Test is Positive, student has declared Neither exactly determines the original event
- How do I update the probability of the original event given the additional information?
Conditional Probability¶
Conditional Probability¶
Probability of an event given some information (it is conditioned on the information) Example:
- “80% of Sophomores are Declared”
- P(Declared | Sophomore) = 0.8 <--- Notation
Conditional vs Joint Probabilities¶
- Recall the joint probability of two events:
- P(Declared, Sophomore) = chance of a random student being a declared and a Sophomore
- Conditional probability (the stuff after | is given):
- P(Declared | Sophomore) = chance of a random Sophomore student being declared
- Which one is bigger?
Answer: the conditional, will see why in a moment.
An Example¶
from IPython.display import IFrame
IFrame('https://docs.google.com/presentation/d/e/2PACX-1vRiLsFDsuuT\
_fGEkjNJJ5Yv6MdEkWshYniIDyrzR4F4vN7UkAUgwT-MrhUTy8_gxwyhLv3rTleNScXw\
/embed?start=false&loop=false&delayms=3000', 960, 569)
Tree Diagrams¶
Tree Diagrams¶
from IPython.display import IFrame
IFrame('https://docs.google.com/presentation/d/e/2PACX-1vTYqt2\
-0qckaBNAHfug29S4o0IV-tCrPkOp3a01wWsx65iyAmpFX3gI9ROkaZ21Syf77\
xyiIIDrGAgS/embed?start=false&loop=false&delayms=3000', 960, 569)
Bayes' Rule¶
Bayes' Rule¶
from IPython.display import IFrame
IFrame('https://docs.google.com/presentation/d/e/2PACX-1vSTI_AHfonqA-\
ww_uTioJOpF_sy8PHvEkaZ1B0ahy-KdKXygejBtQeQpIACZ0xNLnEYCfTbfkSC3Klw/\
embed?start=false&loop=false&delayms=3000', 960, 569)
A Closer Look at the Answer¶
Assume a patient is picked at random.
- Prior probability of disease
- P(Disease) = 0.001 = one-tenth of 1%
- Posterior probability of disease given positive test
- P(Disease | Test positive) = 0.0194... ≅ 2%
- Bigger than the prior, but still pretty small
- Should we approve such a test?
- The test has low error rates compared to most tests
- How can this be?
Assumptions Matter¶
- "Assume a patient is picked at random."
- But usually, people aren’t picked at random for medical tests
- So our intuition about randomly picked patients may not be great
- For a randomly picked patient, the result does make sense, because the disease is very rare.
- What if the doctor believes there is a 10% chance the patient has the disease?
Bayes' Rule and Covid Testing¶
Demo: Bayes' Rule¶
Create a function that calculates $P(A \mid B) = \frac{P(A) \cdot P(B\mid A)}{P(B)}$
def bayes_rule(pr_a, pr_b_given_a, pr_b_given_not_a):
"""
Bayes' Rule
P(A | B) = P(A)P(B|A) / P(B)
To Compute P(B)
P(B) = P(B, A) + P(B, Not A)
= P(A)P(B|A) + P(Not A)P(B | Not A)
"""
prb_b = pr_a * pr_b_given_a + (1-pr_a) * pr_b_given_not_a
return pr_a * pr_b_given_a / prb_b
Use bayes_rule to calculate the probability for the original medical question.
pr_disease = 1/1000
pr_pos_given_disease = 0.99
pr_pos_given_no_disease = 0.05
bayes_rule(pr_disease, pr_pos_given_disease, pr_pos_given_no_disease)
0.019434628975265017
How does the conditional probability change when the prior is larger?
pr_disease_update = 100/1000
pr_pos_given_disease = 0.99
pr_pos_given_no_disease = 0.05
bayes_rule(pr_disease_update, pr_pos_given_disease, pr_pos_given_no_disease)
bayes_rule(pr_disease_update, pr_pos_given_disease, pr_pos_given_no_disease)
0.6875
Notice how quickly the Posterior probability climbs as the Prior probability increases.
pr_disease = np.arange(1,999)/1000
post = bayes_rule(pr_disease, pr_pos_given_disease, pr_pos_given_no_disease)
Table().with_columns(
"Prior Pr(Disease)", pr_disease,
"Posterior Pr(Disease | Pos. Test)", post).iplot("Prior Pr(Disease)")
Subjective Probabilities¶
Subjective Probabilities¶
- A probability of an outcome can be thought of as:
- A Perspective: The frequency with which it will occur in repeated trials
- Another Perspective: The subjective degree of belief that it will (or has) occurred
- Why use subjective priors?
- In order to quantify a belief that is relevant to a decision
- If the subject of your prediction was not selected randomly from the population